Second version of encTEX: UTF-8 support
نویسنده
چکیده
The UTF-8 encoding keeps the standard ASCII characters unchanged and encodes the accented letters of our alphabets in two bytes. The standard 8bit TEX is not ready for the UTF-8 input because it have to manage the single character as two tokens. It means you cannot set the \catcode, \uccode, etc. to these single characters and you cannot do \futurelet of the next character in normal sense. The second version of my encTEX solves these problems. The encTEX is full backward compatible with the original TEX. It adds ten new primitives by which you can set or read the conversion tables used by input processor of TEX or used during output to the terminal, log and \write files. The second version gives possibility to convert the multi-byte sequences to one byte or to control sequence. You can implement up to 256 UTF-8 codes as one byte and unlimited number of other UTF-8 codes as a control sequence. All internals in 8bit TEX are working in the same way as if “normal one byte encoding” of input files is used. I think that the UTF-8 encoding will be used more common. In such situation, there is no another way than to modify the input processor of TEX otherwise the 8bit TEX will dead in short time.
منابع مشابه
Providing some UTF-8 support via inputenc
3 Mapping characters — based on font (glyph) encodings 11 3.1 About the table itself . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 The mapping table . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.4 Mappings for OT1 glyphs . . . . . . . . . . . . . . . . . . . . . . . 24 3.5 Mappings for OMS g...
متن کاملPutting the Cork back in the bottle— Improving Unicode support in TEX
Until recently, all of the hyphenation patterns available for different languages in TeX were using 8-bit font encodings, and were therefore not directly usable with UTF-8 TeX engines such as XeTeX and LuaTeX. When the former was included in TeX Live in 2007, Jonathan Kew, its author, devised a temporary way to use them with XeTeX as well as the “old” TeX engines. Last spring, we undertook to c...
متن کاملPackage ‘ Rmalschains ’ August 29 , 2013
August 29, 2013 Maintainer Christoph Bergmeir License GPL-3 | file LICENSE Title Continuous Optimization using Memetic Algorithms with Local Search Chains (MA-LS-Chains) in R LinkingTo Rcpp Type Package LazyLoad yes Author Christoph Bergmeir, Daniel Molina, José M. Benítez Description This package implements an algorithm family for continuous optimization called memet...
متن کاملPackage ‘ cp 4 p ’
May 16, 2016 Type Package Title Calibration Plot for Proteomics Version 0.3.5 Date 2016-05-11 Author Quentin Giai Gianetto, Florence Combes, Claire Ramus, Christophe Bruley, Yohann Couté, Thomas Burger Maintainer Quentin Giai Gianetto Description Functions to check whether a vector of p-values respects the assumptions of FDR (false discovery rate) control procedures and to ...
متن کامل1-0 Transformation Form of UTF-8
Based on Multilevel Mark Theory,11-10 and 1-0 transformation form of UTF-8 are proposed in this paper. The transformation between UCS and 1-0 form of UTF-8 is introduced, then, the transformation between Local Code and 1-0 Form of UTF-8 is discussed in detail.
متن کامل